A Semi - supervised Text Clustering Algorithm Based on Pairwise Constraints ★

نویسندگان

  • Jiang Zhong
  • Gaofeng Dong
  • Ying Zhou
  • Xue Li
  • Longhai Liu
  • Qiang Chen
  • Huaxiang Zhang
چکیده

In this paper, an active learning method which can effectively select pairwise constraints during clustering procedure was presented. A novel semi-supervised text clustering algorithm was proposed, which employed an effective pairwise constraints selection method. As the samples on the fuzzy boundary are far away from the cluster center in the clustering procedure, they can be easily divided into the wrong clusters. Therefore, we choose the pairwise constraint points from the fuzzy boundary to guide the clustering process towards appropriate partition. The experimental results show that the proposed algorithm can effectively improve the text clustering results by using the same amount of pairwise constraints.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...

متن کامل

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...

متن کامل

On the Comparison of Semi-Supervised Hierarchical Clustering Algorithms in Text Mining Tasks

Semi-supervised clustering approaches have emerged as an option for enhancing clustering results. These algorithms use external information to guide the clustering process. In particular, semi-supervised hierarchical clustering approaches have been explored in many fields in the last years. These algorithms provide efficient and personalized hierarchical overviews of datasets. To the best of th...

متن کامل

Fuzzy Clustering with Pairwise Constraints for Knowledge-Driven Image Categorization

The identification of categories in image databases usually relies on clustering algorithms that only exploit the feature-based similarities between images. The addition of semantic information should help improving the results of the categorization process. Pairwise constraints between some images are easy to provide, even when the user has a very incomplete prior knowledge of the image catego...

متن کامل

Semi-Supervised Clustering with Limited Background Knowledge

In many machine learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate. Consequently, semi-supervised learning, learning from a combination of both labeled and unlabeled data, has become a topic of significant recent interest. Our research focus is on semi-supervised clustering, which uses a small amount of supervised data in the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011